AITopics | Kolkata

Collaborating Authors

Kolkata

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

arXiv.org Machine LearningApr-17-2026

Multiplicative gating is widely used in neural architectures and has recently been applied to attention layers to improve performance and training stability in large language models. Despite the success of gated attention, the mathematical implications of gated attention mechanisms remain poorly understood. We study attention through the geometry of its representations by modeling outputs as mean parameters of Gaussian distributions and analyzing the induced Fisher--Rao geometry. We show that ungated attention operator is restricted to intrinsically flat statistical manifolds due to its affine structure, while multiplicative gating enables non-flat geometries, including positively curved manifolds that are unattainable in the ungated setting. These results establish a geometric expressivity gap between ungated and gated attention. Empirically, we show that gated models exhibit higher representation curvature and improved performance on tasks requiring nonlinear decision boundaries whereas they provide no consistent advantage on tasks with linear decision boundaries. Furthermore, we identify a structured regime in which curvature accumulates under composition, yielding a systematic depth amplification effect.

curvature, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2604.14702

Country:

North America > United States > California (0.14)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

Paxton, Napoleon

arXiv.org Machine LearningApr-9-2026

Dr. David Blackwell was a mathematician and statistician of the first rank, whose contributions to statistical theory, game theory, and decision theory predated many of the algorithmic breakthroughs that define modern artificial intelligence. This survey examines three of his most consequential theoretical results the Rao Blackwell theorem, the Blackwell Approachability theorem, and the Blackwell Informativeness theorem (comparison of experiments) and traces their direct influence on contemporary AI and machine learning. We show that these results, developed primarily in the 1940s and 1950s, remain technically live across modern subfields including Markov Chain Monte Carlo inference, autonomous mobile robot navigation (SLAM), generative model training, no-regret online learning, reinforcement learning from human feedback (RLHF), large language model alignment, and information design. NVIDIAs 2024 decision to name their flagship GPU architecture (Blackwell) provides vivid testament to his enduring relevance. We also document an emerging frontier: explicit Rao Blackwellized variance reduction in LLM RLHF pipelines, recently proposed but not yet standard practice. Together, Blackwell theorems form a unified framework addressing information compression, sequential decision making under uncertainty, and the comparison of information sources precisely the problems at the core of modern AI.

blackwell, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2604.06621

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(3 more...)

Genre: Overview (1.00)

Industry:

Education (0.36)
Information Technology (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generating DDPM-based Samples from Tilted Distributions

Mandal, Himadri, Gupta, Dhruman, Gupta, Rushil, Iyer, Sarvesh Ravichandran, Bandyopadhyay, Agniv, Bassamboo, Achal, Gupta, Varun, Juneja, Sandeep

arXiv.org Machine LearningApr-6-2026

Given $n$ independent samples from a $d$-dimensional probability distribution, our aim is to generate diffusion-based samples from a distribution obtained by tilting the original, where the degree of tilt is parametrized by $θ\in \mathbb{R}^d$. We define a plug-in estimator and show that it is minimax-optimal. We develop Wasserstein bounds between the distribution of the plug-in estimator and the true distribution as a function of $n$ and $θ$, illustrating regimes where the output and the desired true distribution are close. Further, under some assumptions, we prove the TV-accuracy of running Diffusion on these tilted samples. Our theoretical results are supported by extensive simulations. Applications of our work include finance, weather and climate modelling, and many other domains, where the aim may be to generate samples from a tilted distribution that satisfies practically motivated moment constraints.

artificial intelligence, diffusion, machine learning, (16 more...)

arXiv.org Machine Learning

2604.03015

Country:

Africa > Rwanda > Kigali > Kigali (0.04)
North America > United States > Utah (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BVFLMSP : Bayesian Vertical Federated Learning for Multimodal Survival with Privacy

Kar, Abhilash, Saha, Basisth, Sen, Tanmay, Pradhan, Biswabrata

arXiv.org Machine LearningApr-3-2026

Multimodal time-to-event prediction often requires integrating sensitive data distributed across multiple parties, making centralized model training impractical due to privacy constraints. At the same time, most existing multimodal survival models produce single deterministic predictions without indicating how confident the model is in its estimates, which can limit their reliability in real-world decision making. To address these challenges, we propose BVFLMSP, a Bayesian Vertical Federated Learning (VFL) framework for multimodal time-to-event analysis based on a Split Neural Network architecture. In BVFLMSP, each client independently models a specific data modality using a Bayesian neural network, while a central server aggregates intermediate representations to perform survival risk prediction. To enhance privacy, we integrate differential privacy mechanisms by perturbing client side representations before transmission, providing formal privacy guarantees against information leakage during federated training. We first evaluate our Bayesian multimodal survival model against widely used single modality survival baselines and the centralized multimodal baseline MultiSurv. Across multimodal settings, the proposed method shows consistent improvements in discrimination performance, with up to 0.02 higher C-index compared to MultiSurv. We then compare federated and centralized learning under varying privacy budgets across different modality combinations, highlighting the tradeoff between predictive performance and privacy. Experimental results show that BVFLMSP effectively includes multimodal data, improves survival prediction over existing baselines, and remains robust under strict privacy constraints while providing uncertainty estimates.

artificial intelligence, learning, machine learning, (18 more...)

arXiv.org Machine Learning

2604.02248

Country: Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Banking & Finance (0.68)
Information Technology > Security & Privacy (0.46)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Trust Region Constrained Bayesian Optimization with Penalized Constraint Handling

Chowdhury, Raju, Sen, Tanmay, Bhuyan, Prajamitra, Pradhan, Biswabrata

arXiv.org Machine LearningMar-26-2026

Constrained optimization in high-dimensional black-box settings is difficult due to expensive evaluations, the lack of gradient information, and complex feasibility regions. In this work, we propose a Bayesian optimization method that combines a penalty formulation, a surrogate model, and a trust region strategy. The constrained problem is converted to an unconstrained form by penalizing constraint violations, which provides a unified modeling framework. A trust region restricts the search to a local region around the current best solution, which improves stability and efficiency in high dimensions. Within this region, we use the Expected Improvement acquisition function to select evaluation points by balancing improvement and uncertainty. The proposed Trust Region method integrates penalty-based constraint handling with local surrogate modeling. This combination enables efficient exploration of feasible regions while maintaining sample efficiency. We compare the proposed method with state-of-the-art methods on synthetic and real-world high-dimensional constrained optimization problems. The results show that the method identifies high-quality feasible solutions with fewer evaluations and maintains stable performance across different settings.

artificial intelligence, optimization, optimization problem, (15 more...)

arXiv.org Machine Learning

2603.24567

Country:

Asia > India > West Bengal > Kolkata (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Jana, Suryasis, Ghosh, Abhik

arXiv.org Machine LearningMar-19-2026

Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2603.17628

Country:

Asia > India > West Bengal > Kolkata (0.40)
Europe > Spain (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (0.70)
Government > Military (0.61)

Add feedback

Estimating Staged Event Tree Models via Hierarchical Clustering on the Simplex

Shoaib, Muhammad, Riccomagno, Eva, Leonelli, Manuele, Varando, Gherardo

arXiv.org Machine LearningMar-17-2026

Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2603.15568

Country:

North America > United States > Wisconsin (0.04)
North America > United States > New York (0.04)
North America > United States > Kansas (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian Conservative Policy Optimization (BCPO): A Novel Uncertainty-Calibrated Offline Reinforcement Learning with Credible Lower Bounds

Chatterjee, Debashis

arXiv.org Machine LearningMar-16-2026

Offline reinforcement learning (RL) aims to learn decision policies from a fixed batch of logged transitions, without additional environment interaction. Despite remarkable empirical progress, offline RL remains fragile under distribution shifts: value-based methods can overestimate the value of unseen actions, yielding policies that exploit model errors rather than genuine long-term rewards. We propose \emph{Bayesian Conservative Policy Optimization (BCPO)}, a unified framework that converts epistemic uncertainty into \emph{provably conservative} policy improvement. BCPO maintains a hierarchical Bayesian posterior over environment/value models, constructs a \emph{credible lower bound} (LCB) on action values, and performs policy updates under explicit KL regularization toward the behavior distribution. This yields an uncertainty-calibrated analogue of conservative policy iteration in the offline regime. We provide a finite-MDP theory showing that the pessimistic fixed point lower-bounds the true value function with high probability and that KL-controlled updates improve a computable return lower bound. Empirically, we verify the methodology on a real offline replay dataset for the CartPole benchmark obtained via the \texttt{d3rlpy} ecosystem, and report diagnostics that link uncertainty growth and policy drift to offline instability, motivating principled early stopping and calibration

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2603.12284

Country: